Optimal detection of changepoints with a linear computational cost
نویسندگان
چکیده
We consider the problem of detecting multiple changepoints in large data sets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example in genetics as we analyse larger regions of the genome, or in finance as we observe time-series over longer periods. We consider the common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalised likelihood and minimum description length. We introduce a new ∗R. Killick is Senior Research Associate, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). P. Fearnhead is Professor, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). I.A. Eckley is Senior Lecturer, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). The authors are grateful to Richard Davis and Alice Cleynen for providing the Auto-PARM and PDPA software respectively. Part of this research was conducted whilst R. Killick was a jointly funded Engineering and Physical Sciences Research Council (EPSRC) / Shell Research Ltd graduate student at Lancaster University. Both I.A. Eckley and R. Killick also gratefully acknowledge the financial support of the EPSRC grant number EP/I016368/1. 1 ar X iv :1 10 1. 14 38 v3 [ st at .M E ] 9 O ct 2 01 2 method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which, under mild conditions, is linear in the number of observations. This compares favourably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies we show that our new method can be orders of magnitude faster than these alternative exact methods. We also compare with the Binary Segmentation algorithm for identifying changepoints, showing that the exactness of our approach can lead to substantial improvements in the accuracy of the inferred segmentation of the data.
منابع مشابه
On optimal multiple changepoint algorithms for large data
Many common approaches to detecting changepoints, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms ofminimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistica...
متن کاملThe use of cumulative sums for detection of changepoints in the rate parameter of a Poisson Process
This paper studies the problem of multiple changepoints in rate parameter of a Poisson process. We propose a binary segmentation algorithm in conjunction with a cumulative sums statistic for detection of changepoints such that in each step we need only to test the presence of a simple changepoint. We derive the asymptotic distribution of the proposed statistic, prove its consistency and obtain ...
متن کاملDetection of changes in variance using binary segmentation and optimal partitioning
This work explores the performance of binary segmentation and optimal partitioning in the context of detecting changes in variance for time-series. Both, binary segmentation and optimal partitioning, are based on cost functions that penalise a high amount of changepoints in order to avoid overfitting. Analysis is performed on simulated time-series; first on Normal data with constant but unknown...
متن کاملChange detection from satellite images based on optimal asymmetric thresholding the difference image
As a process to detect changes in land cover by using multi-temporal satellite images, change detection is one of the practical subjects in field of remote sensing. Any progress on this issue increase the accuracy of results as well as facilitating and accelerating the analysis of multi-temporal data and reducing the cost of producing geospatial information. In this study, an unsupervised chang...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کامل